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A substantial volume of research has been devoted to studies of community structure in networks, 
but communities are not the only possible form of large-scale network structure. Here we describe 
a broad extension of community structure that encompasses traditional communities but includes a 
wide range of generalized structural patterns as well. We describe a principled method for detecting 
this generalized structure in empirical network data and demonstrate with real-world examples how 
it can be used to learn new things about the shape and meaning of networks. 


The detection and analysis of large-scale structure in 
networks has been the subject of a vigorous research 
effort in recent years, in part because of the highly 
successful ^plication of ideas drawn from statistical 
physics [H, 0. Particular energy has been devoted to 
the study of community structure, meaning the division 
of networks into densely connected subgroups, a common 
and revealing feature, especially in social and biological 
networks Q. Community structure is, however, only one 
of many possibilities where real-world networks are con¬ 
cerned. In this paper, we describe a broad generaliza¬ 
tion of community structure that encompasses not only 
traditional communities but also overlapping or fuzzy 
communities, ranking or stratified structure, geometric 
networks, and a range of other structural types, yet is 
easily and flexibly detected using a fast, mathematically 
principled procedure which we describe. We give demon¬ 
strative applications of our approach to both computer¬ 
generated test networks and real-world examples. 

Community structure can be thought of as a division 
of the nodes of a network into disjoint groups such that 
the probability of an edge is higher between nodes in 
the same group than between nodes in different groups. 
For instance, one can generate artificial networks with 
community structure using the stochastic block model, a 
mathematical model that follows exactly this principle. 
In the stochastic block model the nodes of a network 
are divided into k groups, with a node being assigned 
to group r with some probability for r = I... fc, and 
then edges are placed between node pairs independently 
with probabilities Prs where r and s are the groups the 
nodes fall in. If the diagonal probabilities Prr are larger 
than the off-diagonal ones, we get traditional community 
structure. 

Alternatively, however, one can also look at the 
stochastic block model another way: imagine that we as¬ 
sign each node a random node parameter x between zero 
and one and edges are placed between node pairs with a 
probability uj{x, y) that is a function of the node param¬ 
eters X and y of the pair. If iij{x, y) is piecewise constant 
with rectangular regions of size 7r-7s and value Prs, 
then this model is precisely equivalent to the traditional 


block model. But this prompts us to ask what is so spe¬ 
cial about piecewise constant functions? It is certainly 
possible that some networks might contain structure that 
is better captured by functions uj{x,y) of other forms. 
Why not let io{x, y) take a more general functional form, 
thereby creating a generalized type of community struc¬ 
ture that includes the traditional type as a subset but 
can also capture other structures as well? This is the fun¬ 
damental idea behind the generalized structures of this 
paper: edge probabilities are arbitrary functions of con¬ 
tinuous node parameters. 

The idea is related to two threads of work in the previ¬ 
ous literature. One, in sociology and statistics, concerns 
“latent space” models, in which nodes in a network are lo¬ 
cated somewhere in a Euclidean space and are more likely 
to be connected if they are spatially close than if they are 
far apart Q. A specific functional form is typically as¬ 
sumed for the connection probability and the model is 
fitted to data using Monte Carlo methods. The other 
thread, in the mathematics literature, concerns so-called 
“graphon” models and does not deal with the analysis of 
empirical data but with the mathematical properties of 
models, showing in particular that models of a kind simi¬ 
lar to that described here are powerful enough to capture 
the properties of any theoretical ensemble of networks in 
the limit of large size, at least in the case where the net¬ 
works are dense d, . 

In this paper, we define a specific model of general¬ 
ized community structure and a method for fitting it to 
empirical data using Bayesian inference. The fit places 
each node of the network in the “latent space” of the 
node parameters x and, simultaneously, gives us an esti¬ 
mate of the probability function uj{x,y). Between them, 
these two outputs tell us a great deal about the struc¬ 
ture a network possesses and the role each node plays 
within that structure. The method is computationally 
efficient, allowing for its application to large networks, 
and provides significantly more insight than the tradi¬ 
tional community division into discrete groups jOr even 
recent generalizations to overlapping groups [t], . 

We begin by defining a model that generates networks 
with the generalized community structure we are inter- 
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ested in. The model follows the lines sketched above, but 
with some crucial differences. We take n nodes and for 
each node u we generate a node parameter uniformly 
at random in the interval [0,1]. Then between each pair 
of nodes u, v we place an undirected edge with probabil¬ 
ity 


where (/(x) is any probability distribution over x. It is 
straightforward to verify that the exact equality is recov¬ 
ered, and hence the right-hand side maximized, when 


^ P(A,x|a;) 

/P(A,x|w) d"x' 


( 6 ) 
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where du,dy are the degrees of the nodes, m = ^ 
is the total number of edges in the network, and ui{x,y) 
is a function of our choosing, which we will call the edge 
function. Note that ui{x,y) must be symmetric with re¬ 
spect to its arguments for an undirected network such as 
this. 

The inclusion of the degrees is necessary to accom¬ 
modate the fitting of networks with broad degree dis- 
tributions (which includes essentially all real-world net¬ 
works [SlllOj). Without it, the model effectively assumes 
a Poisson degree distribution, which is a poor fit to most 
networks and can cause the calculation to fail ll|. The 
factor dudv/2m is the probability of an edge between 
nodes with degrees du , dy if edges are placed at ran¬ 
dom 0. Hence uj{xu,Xy) parametrizes the variation of 
the probability relative to this baseline level and is typ¬ 
ically of order 1, making p^y small in the limit where m 
becomes large. 

Given the model, we fit it to empirical network data us¬ 
ing the method of maximum likelihood. The probability 
or likelihood P(A, x|a;) that we generate a particular set 
of node parameters x = {xu\ and a particular network 
structure described by the adjacency matrix A = {uyv} 
is 


P(A, Xlw) = (1 - Puy) 


1 — 


( 2 ) 


u<.v 


To find the value of the edge function a; (a;, y) that best fits 
an observed network we want to maximize the marginal 
likelihood 


P(A|w) = J P(A,x|w) d”x. 


( 3 ) 


or equivalently its logarithm, whose maximum falls in the 
same place. Direct maximization leads to a set of implicit 
equations that are hard to solve, even numerically, so 
instead we employ the following trick. 

For any positive-definite function f{x), Jensen’s in¬ 
equality says that 

^ogJf{x)dx> J q{x)\og^^ dx, (4) 

where q{x) is any probability distribution over x such 
that / q{x) da; = 1. Applying O to the log of the 
marginal likelihood, Eq. we get 


Further maximization with respect to w then gives us 
the maximum of the marginal likelihood, which is the 
result we are looking for. Put another way, a double 
maximization of the right-hand side of ([5]) with respect 
to both ( 7 (x) and u will achieve the desired result. And 
this double maximization can be conveniently achieved 
by alternately maximizing with respect to g(x) using (|6]) 
and with respect to w by differentiating. 

This method, which is a standard one in statistics and 
machine learning, is called an expectation-maximization 
or EM algorithm [^. It involves simply iterating these 
two operations from (for instance) a random initial con¬ 
dition until convergence. The converged value of the 
probability density q(x) has a nice physical interpre¬ 
tation. Combining Eqs. © and ®, we have ^(x) = 
P(A, x|w)/P(A|w) = P(x|A,a;). In other words, q{-x) is 
the posterior probability distribution on the node param¬ 
eters X given the observed network and the edge func¬ 
tion uj{x,y). It tells us the probability of any given as¬ 
signment X of parameters to nodes. It is this quantity 
that will in fact be our primary object of interest here. 

Substituting from Eqs. m and m into ®, keeping 
terms to leading order in small quantities and dropping 
overall constants, we can write the quantity to be maxi¬ 
mized as 



'^quv{x,y) 

uv 


O^uv log Uj{x,y) 


dudyUj{x,y) 

2m 


da; dy, 


( 7 ) 

where quv{x,y) = f q{x.)S{xu — x)S(xy — y) d^x is the 
posterior marginal probability that nodes u, v have node 
parameters x, y respectively. The obvious next step is to 
maximize © by functional differentiation with respect 
to ui{x,y), but there is a problem. If we allow uj to 
take any form at all then it has an infinite number of 
degrees of freedom, which guarantees overhtting of the 
data. Put another way, physical intuition suggests that 
uj{x,y) should be smooth in some sense, and we need a 
way to impose that smoothness as a constraint on the op¬ 
timization. There are a number of ways we could achieve 
this, but a common one is to express the function in 
terms of a finite set of basis functions. For nonnegative 
functions such as w a convenient basis is the Bernstein 
polynomials of degree N: 


Bk{x) 



x) 


N-k 


k = Q...N. (8) 


log 


P(A,x|a;) d”x> 


g(x) log 


P(A,x|a;) 

9(x) 


d"x, (5) 


The Bernstein polynomials form a complete basis for 
polynomials of degree N and are nonnegative in [0,1], 
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so a linear combination '^^^QCkBk{x) is also nonnega¬ 
tive provided > 0 for all k. Our edge function uj{x,y) 
is a function of two variables, so we will write it as a 
double expansion in Bernstein polynomials 

N 

^{x,y)= ^ CjkBj{x)Bk{y), (9) 

j,k=0 

which again is nonnegative for cjk > 0. Bernstein poly¬ 
nomials have excellent stability properties under fluctu¬ 
ations of the values of the expansion coefficients, which 
makes them ideal for statistical applications such as ours. 
Note that since uj{x,y) is symmetric with respect to its 
arguments we must have Cjk = Ckj- 

If uj{x,y) is constrained to take this form, then in¬ 
stead of the unconstrained maximization of 0 we now 
want to maximize with respect to the coefficients Cjk- 
To do this, we substitute from m into © and apply 
Jensen’s inequality again, this time in its summation 
form \ogJ2ifi > Then, by the same 

argument as previously, we find that the optimal coeffi¬ 
cient values are given by the double maximization with 
respect to Cjk and Qjk{x,y) of 


Kx, y) Qjkix, y) log da; dy 

^ Q3k{x,y) 

CjkBj{x)Bk{ii)dx<ly, (10) 


where 

y) = ^ X! ^uvquv{x, y), iy{x) = d«g«(a;), 

uv u 

( 11 ) 

and Quix) = n ^ I quv(x, y)dy is the marginal proba¬ 
bility that node u has node parameter x. The maximiza¬ 
tion with respect to Qjk{x,y) is achieved by setting 


Qjk {x, y) 


^jkBj (x)Ufc (y) 
Ejfe CjkBj{x)Bk{y) ’ 


( 12 ) 


and the maximization with respect to Cjk is achieved by 
differentiating, which gives 


^ If y(x,y)Qjk(x,y) dx dy 

/ v{x)Bj{x) dx f v{y)Bk{y) dy' 


(13) 


Since all quantities on the right of this equation are non¬ 
negative, Cjk > 0 for all j,k and hence w(x,y) > 0, as 
required. 

The calculation of the optimal values of the Cjk is a 
matter of iterating Eqs. © and m to convergence, 
starting from the best current estimate of the coefficients. 
Note that the quantities y, and v need be calculated only 
once each time around the EM algorithm, and both can 
be calculated in time linear in the size of the network 


in the common case of a sparse network with m (x n. 
The integrals in Eq. (USD we perform numerically, using 
standard Gauss-Legendre quadrature. 

This, in principle, describes a complete algorithm for 
fitting the model to observed network data, but in prac¬ 
tice the procedure is cumbersome because of the denom¬ 
inator of Eq. ([6]), which involves an n-dimensional in¬ 
tegral, where n is the number of nodes in the network, 
which is typically large. The traditional solution to this 
problem is to subsample the distribution g(x) approxi¬ 
mately using Monte Carlo importance sampling. Here, 
however, we use a different approach proposed recently 
by Decelle et al. [l^, which employs belief propagation 
and returns good results while being markedly faster than 
Monte Carlo. The method focuses on a function ? 7 „_,.„(x), 
called the belief, which represents the probability that 
node u has node parameter x if node v is removed from 
the network. The removal of node v allows us to write a 
self-consistent set of equations whose solution gives us the 
beliefs. The equations are a straightforward generaliza¬ 
tion to the present model of those given by Decelle et al.-. 

r]u^y{x) = ^ —exp(-^d„(iu, [ yi„(y)w(x,y) dy ) 

V ^ Jo / 

X n / r]n,^u{y)uj{x,y)dy, (14) 

w{^v) ^ 

auu; — 1 


where qwiy) is again the marginal posterior probability 
for node w to have node parameter y and as before we 
have dropped terms beyond leading order in small quanti¬ 
ties. The quantity is a normalizing constant which 

ensures that the beliefs integrate to unity: 


exp {-'^dy 


qy,{y)uj{x,y) dy 


^ n / dw^u{y)txi{x,y) dy dx. (15) 

w(^v) ^ 

a^uj— 1 


The belief propagation method consists of the iteration 
of these equations to convergence starting from a suitable 
initial condition (normally the current best estimate of 
the beliefs). The equations are exact on networks that 
take the form of trees, or on locally tree-like networks in 
the limit of large network size (where local neighborhoods 
of arbitrary size are trees). On other networks, they are 
approximate only, but in practice give excellent results. 

Once we have the values of the beliefs, the crucial two- 
node marginal probability quv{x,y) is given by 


quv{x,y) 



v{x)r] 

{x)t]v 


u{y)uj{x,y) 
( 2 /)w(x,y) dx dy 


(16) 


Armed with these quantities for every node pair con¬ 
nected by an edge, we can evaluate /i(x, y) and v{x) from 
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FIG. 1: (a) Left: density plot of the posterior marginal prob¬ 
ability densities qu{x) that node u has node parameter x for 
an application of our algorithm to a 600-node stochastic block 
model with three groups. Colors indicate the probabilities 
and there are 600 columns, one for each node. Right: density 
plot of the edge function oj{x,y). (b) The neural network of 
the worm C. elegans, drawn in real space, as it falls within 
the body of the worm. Colors represent the average values of 
the node parameters Xu inferred for each neuron by our algo¬ 
rithm. (c) Network representation of the interstate highways 
of the contiguous United States. Again, node colors represent 
the average node parameters Xu- 


Eq. (ITT|) then iterate Eqs. (IT^ and (IT^ to compute new 
values of the parameters Cjk, and repeat. 

We give three example applications of our methods, 
one to a computer-generated benchmark network and 
the others to real-world networks displaying nontrivial 
latent-space structure that is readily uncovered by our 
algorithm. 

For our first example, we use a computer-generated 
test network created using the standard stochastic block 
model, with n = 600 nodes divided into three equally 
sized groups of 200 nodes each, with probabilities pin = 
Cin/n and Pout = Cont/n for edges between nodes in the 
same and different groups respectively and Cin = 15, 
Cout = 3. Figure [T^ shows a density plot of the marginal 
probability distributions qu{x) on the node parameters 


calculated by our algorithm using a degree-4 (quartic) 
polynomial representation of the edge function w. (We 
also used quartic representations for the other examples 
below.) The plot consists of 600 columns, one for each 
node, color coded to show the value of qu{x) for the cor¬ 
responding node. As the plot shows, the algorithm has 
found the three known groups in the network, placing 
them at three widely spaced points in the latent space 
of the node parameters. (In this case, the first group is 
placed in the middle, the second at the top, and the third 
at the bottom, but all orders are equivalent.) We also 
show a plot of the inferred edge function u!(x,y), which 
in this case has a heavy band along the diagonal, indicat¬ 
ing “assortative” structure, in which nodes are primarily 
connected to others in the same group. 

Our second example is a real-world network, the neu¬ 
ral network of the nematode (roundworm) C. elegans, 
which has been mapped in its entirety using electron mi¬ 
croscopy IJ, and contains a total of 299 neurons. 


The worm has a long tubular body, with neurons ar¬ 
ranged not just in its head but along its entire length. 
Neurons tend to be connected to others near to them, 
so we expect spatial position to play the role of a latent 
variable and our algorithm should be able to infer the po¬ 
sitions of neurons by examining the structure of network. 
Figure [T)d shows that indeed this is the case. The figure 
shows the network as it appears within the body of the 
worm, with nodes colored according to the mean values 
of the node parameters found by the algorithm, and we 
can see a strong correlation between node color and posi¬ 
tion. The largest number of nodes is concentrated in the 
head, mostly colored red in the figure; others along the 
body appear in blue and green. If we did not know the 
physical positions of the nodes in this case, or if we did 
not know the correlation between position and network 
structure, we could discover it using this analysis. 

Our third example, shown in Fig. [TJ:, is an analysis 
of the network of interstate highways in the contiguous 
United States. This network is embedded in geometric 
space, the surface of the Earth. Again we would expect 
our algorithm to find this embedding and indeed it does. 
The colors of the nodes represent the mean values of their 
node parameters and there is a clear correspondence be¬ 
tween node color and position, with the inferred node 
parameters being lowest in the north-east of the country 
and increasing in all directions from there. (Note that 
even though the portion of the network colored in red 
and orange appears much larger than the rest, it is in 
fact about the same size in terms of number of nodes be¬ 
cause of the higher density of nodes in the north-east.) 
The true underlying space in this case has two dimen¬ 
sions, where our model has only one, and this suggests 
a potential generalization to latent spaces with two (or 
more) dimensions. It turns out that such a generalization 
is possible and straightforward, but we leave the devel¬ 
opments for future work. 









5 


To summarize, we have in this paper described a gener¬ 
alized form of community structure in networks in which, 
instead of placing network nodes in discrete groups, we 
place them at positions in a continuous space and edge 
probabilities depend in a general manner on those po¬ 
sitions. We have given a computationally efficient algo¬ 
rithm for inferring such structure from empirical network 
data, based on a combination of an EM algorithm and 
belief propagation, and find that it successfully uncov¬ 
ers nontrivial structural information about both artificial 
and real networks in example applications. 
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1407207 and by the University of Bremen under funding 
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